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Abstract. Most of the consistency analyses of Bayesian procedures for 
variable selection in regression refer to pairwise consistency, that is, 
consistency of Bayes factors. However, variable selection in regression 
is carried out in a given class of regression models where a natural 
variable selector is the posterior probability of the models. 

In this paper we analyze the consistency of the posterior model prob¬ 
abilities when the number of potential regressors grows as the sample 
size grows. The novelty in the posterior model consistency is that it 
depends not only on the priors for the model parameters through the 
Bayes factor, but also on the model priors, so that it is a useful tool 
for choosing priors for both models and model parameters. 

We have found that some classes of priors typically used in variable 
selection yield posterior model inconsistency, while mixtures of these 
priors improve this undesirable behavior. 

For moderate sample sizes, we evaluate Bayesian pairwise variable 
selection procedures by comparing their frequentist Type I and II error 
probabilities. This provides valuable information to discriminate be¬ 
tween the priors for the model parameters commonly used for variable 
selection. 

Key words and phrases: Bayes factors, Bernoulli model priors, g- 
priors, hierarchical uniform model prior, intrinsic priors, posterior 
model consistency, rate of growth of the number of regressors, vari¬ 
able selection. 


1. INTRODUCTION 

In some applications of regression models to com¬ 
plex problems, for instance, in genomic, clustering, 
change points detection, etc., the dimension of the 
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parameter space of the sampling models is either 
very large or grows with the sample size. The ques¬ 
tion we address here is whether consistency of the 
Bayesian variable selection approach still holds in 
this setting. A partial answer to this question was 
given in Moreno, Giron and Casella (2010), where 
consistency of the Bayes factors (pairwise consis¬ 
tency) when the number of regressors k increases 
with rate k = 0(n b ), b < 1, was considered. It was 
there proved that any pair of nested regression mod¬ 
els for which the Bayes factor has an asymptotic ap¬ 
proximation equivalent to the BIC (Schwarz, 1978) 
is consistent for b < 1 but it is not for 6=1. Note 
that the BIC is a valid approximation for a wide 
class of prior distributions on the model parame¬ 
ters. It was also seen that the Bayes factor for the 
intrinsic priors considerably improves the BIC be- 
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havior for small or moderate sample sizes (Casella 
et al., 2009). 

Nevertheless, variable selection in regression is a 
model selection problem in a class 931 of 2 k normal 
regression models, and we wonder if the Bayes factor 
consistency when k = 0(n b ), b < 1, can be extended 
to posterior model consistency in the class of mod¬ 
els 93T. The use of the posterior model probabilities 
as a variable selector procedure implies that variable 
selection is understood as a decision problem where 
the decision space D and the space of states of na¬ 
ture 931 are the same. Assuming a 0—1 loss function 
on the product space D x 931, the optimal decision is 
that of choosing the model with the highest poste¬ 
rior probability; other loss functions can indeed be 
used; see, for instance, the review paper by Clyde 
and George (2004). 

Posterior model consistency in 931 is understood 
as the convergence to one, in probability, of the 
sequence of the posterior probabilities of the true 
model. We are considering the true model to be the 
one from which the observations are drawn. We note 
that the frequentist and Bayesian consistency no¬ 
tions do not necessarily coincide. For instance, Shao 
(1997) defines a true model to be the submodel min¬ 
imizing the average squared prediction error, and 
consistency of a model selection procedure means 
that the selected model converges in probability to 
this model. 

From the necessary and sufficient conditions we 
give to achieve posterior model consistency it follows 
that Bayes factor consistency does not necessarily 
yield posterior model consistency. This was already 
pointed out by Johnson and Rossell (2012). Further, 
posterior model consistency of a Bayesian procedure 
in 931 depends on the Bayes factor, the prior over the 
class of models 931 and the rate of growth of k, and 
thus it has to be studied in a case-by-case basis. 

The Bayes factors we review here are those ob¬ 
tained using the intrinsic priors on the model param¬ 
eters (Berger and Pericchi (1996); Moreno (1997); 
Moreno, Bertolino and Racugno (1998)) and a cou¬ 
ple of versions of the Zellner’s g-priors (Zellner 
and Siow (1980); Zellner (1986)). These versions 
include the g-priors with g = n and the prior ob¬ 
tained as a mixture of g-priors with respect to 
the InverseGamma(g| 1 /2, n/2). This latter prior was 
recommended by Zellner and Siow (1980) and con¬ 
sidered in Liang et al. (2008) and Scott and Berger 
(2010), among others. As we will see, these Bayes 
factors exhibit different dimension corrections, that 


suggest a different behavior for moderate sample 
sizes, a point that we also explore here. 

The priors over the set of models we review are the 
independent Bernoulli parametric class {tt(M\9), 
0 < 6 < 1} introduced by George and McCulloch 
(1993) and a specific mixture of these priors which 
we refer to as the hierarchical uniform model prior 
7 t hu (M). This latter prior is a particular case of a 
set of hierarchically uniform priors considered by 
George and McCulloch (1993), who argued that 
“one may wish to weight more according the model 
size.” 

Related posterior model consistency for variable 
selection for homoscedastic high-dimensional regres¬ 
sion models was analyzed by Johnson and Rossell 
(2012). They considered Bayes factors for nonlo¬ 
cal priors on the regression parameters, an inverse 
gamma for the common variance errors, and mod¬ 
els priors such that Tr(Mt)/n(M) > e > 0 for any 
M € 931, where Mt , the true model, is a fixed model. 
We note that the Bernoulli class of model priors and 
the hierarchical uniform model prior 7r HU (M) are 
excluded from their analysis. Further, the rate of 
growth of the number of regressors does not play a 
relevant role for the posterior model consistency of 
their Bayesian models, while for the Bayesian mod¬ 
els considered here it does. 

1.1 Notation 

Let Y represent an observable random variable 
and X\,..., Xf,. a potential set of explanatory re¬ 
gressors related through the normal linear model 

1 = A) + fiiX\ H-+ hXk + £fc, £k ~ 77(0, cr|), 

where the vector of regression coefficients /3 fc+1 = 
(j3o,/3i,... ,/3k)' and the variance error a | are un¬ 
known. Let (y,X) be the data set, where y is a 
vector of n independent observations of Y and X a 
n x (A; + 1) design matrix of full rank. This full sam¬ 
pling normal model JV n (y|X/3 fc+1 ,o'|l n ) is denoted 
as Mk and the simplest intercept only normal model 
Xn(y|A)lji, °oI'i) as M o- We remark that the regres¬ 
sion coefficients change across models, although for 
simplicity we use the same alphabetical notation. 

It is convenient to split the class 931 of regres¬ 
sion models involved in variable selection as fol¬ 
lows. By 9 71 j we denote the class of models with 
j regressors, 0 < j < k, the number of which is 
(•) , and by Mj we denote a generic model in 93Tj 
with sampling density N n (y\~Kj + i/3j +1 , crjl n ), where 
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(3j + i = (f3o,Pi, ■ ■ ■ ,/3j)' is the unknown vector of re¬ 
gression coefficients, X J+ i a n x (j + 1) submatrix 
of X and <r| the unknown variance error. Therefore, 

Wt = (J;=o^V The developments in the paper will 
be clear using this somewhat ambiguous, but sim¬ 
pler, notation. 

1.2 Summary 

We find that when k grows with n, the intrinsic 
priors for model parameters are preferred to either 
the g-prior for g = n or the mixtures of g-priors, and 
the hierarchical uniform model prior is preferred to 
the Bernoulli model prior for any fixed value of the 
hyperparameter 6 G (0,1). 

The rest of the paper is organized as follows. In 
Section 2 we give necessary and sufficient conditions 
to achieve posterior model consistency. In Section 3 
we give asymptotic approximations to the Bayes fac¬ 
tors for the g-priors with g = n, for the mixture of 
g-priors and for the intrinsic priors over the model 
parameters, for k = 0(n b ), 0 < b < 1. In Section 4 
posterior model consistency for the Bayesian pro¬ 
cedures is presented. Section 5 contains a sampling 
evaluation of the three Bayes factors for moderate 
sample sizes. A summary of the conclusions is given 
in Section 6, and the Appendix contains the proofs 
of most of the results. 


for nested models. The variable selection procedure 
that uses this posterior model probability as model 
selector is called encompassing from below variable 
selection (Giron et ah, 2006). We may also use the 
encompassing from above approach in which all the 
Bayes factors considered are of the form Bjk( y, X) 
(Casella and Moreno (2006)). Both methods give 
similar results, and in this paper we will consider 
the encompassing from below approach. 

Definition. Posterior model consistency when 
sampling from model Mf holds if the limit in prob¬ 
ability [Mt] of the random variables {Pr(Mj|y, X), 
Mj G 511} is such that 

n limPr(M 3 |y,X) = {j; [M,]. 

A necessary and sufficient condition to achieve 
posterior model consistency when sampling from Mt 
is given in the next theorem. 


Theorem 1. When sampling from M t G Wit, 
posterior model consistency holds if and only if the 
equality 


(A) 


lim V V 

n—too ' Z—' 
j =0 MjGVXj 

Mj^Mt 


B J o(y,X)vr(M i ) 
B t0 ( y,X) n{M t ) 


0, [M t ], 


2. POSTERIOR MODEL CONSISTENCY holds. 


Given a data set (y, X) coming from a linear 
model in 911, and the priors for the models and model 
parameters {n(Pj +1 ,aj\Mj),ir(Mj), Mj G Wlj, j = 
0,1,..., k}, the posterior probability of a generic 
model Mj can be written as 


(1) Pr (Mj\y,X) 


B j0 (y,X)n(Mj) 

£ l= o B i0 (y,X.)TT(Mi) 


where Bjo(y,X) denotes the Bayes factor for com¬ 
paring models Mj and Mq, which is given by 


£?o(y,x) 


= (y N n (y\X j+l (3 j+1 , 



■ n(f3 j+1 ,crj\Mj ) d(3j da, j 

j (^J N n (y\(3 0 ,aQl n )ir(/3 0 Wo\Mo) da 0 dao'j . 

The advantage of the posterior model probability in 
expression (1) is that it only involves Bayes factors 


Proof. The assertion follows from expression 

(!)• □ 

Theorem 1 implies that, if the Bayes factor 
B t o( y,X) is inconsistent under M t , then posterior 
model consistency under Mt does not hold. We note 
that when k is bounded, posterior model consis¬ 
tency holds for virtually any prior over the models 
(Casella et al. (2009)). However, when k = 0(n b ), 
0 < b < 1, it is apparent from (A) that poste¬ 
rior model consistency crucially depends on the 
rate of convergence under Mt to zero of the ratio 
[^■o(y,X)7r(M J -)]/[S to (y,X)7r(M t )]. 

Under the null model Mq, the necessary and suf¬ 
ficient condition (A) reduces to 


< B > JsiX X = [Vo], 


j =1 Mj&Kj 


and it follows that if for some M, the Bayes fac¬ 
tor Bjo(y,X) is inconsistent under Mq, then poste¬ 
rior model consistency under Mq does not hold. It 
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is clear that, when k = 0(n b ), 0 < b < 1 , the rate of 
convergence to zero of Bjo(y,lC)Tr(Mj) determines 
the posterior model consistency. 

Thus, from Theorem 1 it is clear that when k 
increases as the sample size n increases, posterior 
model consistency is a more stringent notion than 
that of the Bayes factor consistency. Furthermore, 
posterior model consistency depends on the specific 
Bayes factors Bj q and the prior on the class of mod¬ 
els 971, and, consequently, it has to be established in 
a case-by-case basis. 


= N j+1 (P j+1 \a 0 ,(a] + a 2 0 )Wjl 1 )RC+(a j \a 0 ), 

where a 0 = (a 0 , 0 ')', = ^(X^X^)” 1 , 

and 


HC + (ctj|(To) 


2 (Tq 
TV a 2 + erg 


is the half Cauchy distribution on R + with loca¬ 
tion parameter 0 and scale do. The unconditional 
intrinsic prior with respect to the reference prior 
ir N (ao,cro) = Co/ao is then given by 


3. PRIORS AND BAYES FACTORS FOR 
VARIABLE SELECTION 

In this section we present priors for the parame¬ 
ters of the models and priors over the class of mod¬ 
els which are commonly used in variable selection. 
We give formulae for the Bayes factors and their 
asymptotic approximations when sampling from an 
arbitrary but fixed model Mf and rate of growth 
k = 0(n b ) for 0 < b < 1 . 

3.1 Intrinsic Priors for Model Parameters 

The intrinsic priors were introduced to justify the 
intrinsic Bayes factor (Berger and Pericchi, 1996). 
The original conditions defining the intrinsic priors 
given by Berger and Pericchi (1996) render a class of 
intrinsic priors (Moreno, 1997), and a limiting proce¬ 
dure for choosing a specific pair of intrinsic priors for 
model selection was proposed in Moreno, Bertolino 
and Racugno (1998). This procedure is based on the 
additional requirement that the intrinsic priors de¬ 
rived from improper priors, which are not necessar¬ 
ily proper, are a limit of proper intrinsic priors. 

Bayes factors for intrinsic priors were used for 
variable selection in regression in Moreno and Giron 
(2005), Casella and Moreno (2006), Giron et al. 
(2006), Leon-Novelo, Moreno and Casella (2012), 
Consonni, Forster and La Rocca (2015), among oth¬ 
ers, and this variable selection procedure improves 
upon the Schwarz approximation for finite sample 
sizes (Casella et al., 2009) and asymptotically for 
high-dimensional regression models (Moreno, Giron 
and Casella, 2010). 

The standard intrinsic method for comparing the 
null model Mq versus the alternative Mj, starting 
from the improper reference prior for the parame¬ 
ters of the models Mq and Mj, provides the proper 
intrinsic prior for the parameters (/3 - +1 , crj), condi¬ 
tional on a null point (ao, 0 o), as 

77 5 T? I^o? ^o) 


71 (Pj+iiVj) 

= j 7r / (/3 j+1 ,cr i |ao,cr 0 )7r iV (Q;o,cro)dao da 0 . 

For comparing model Mq versus Mj the intrinsic pri¬ 
ors are the pair (^(cto, oo), 7 t / (/ 3 j - +1 , crj))- We note 
that it 1 (f3j +1 , &j) depends on the arbitrary constant 
Co that cancels out in the Bayes factor Bjo(y,X), 
and hence no tuning hyperparameters have to be 
adjusted. Thus, the Bayes factor for intrinsic pri¬ 
ors are automatically constructed from the sampling 
models and the reference priors. 

3.2 Zellner’s g-Priors for Model Parameters 

For variable selection with the g-priors we also 
use the encompassing from below approach (the en¬ 
compassing from above version is given in Scott and 
Berger (2010)). A basic assumption on the regres¬ 
sion models for constructing the g-priors is that the 
intercept and the variance error are common param¬ 
eters to all models, which reduces the number of 
parameters involved when comparing Mj versus Mq 
from j + 4 to j + 2. According to this restriction, the 
regression parameters of a generic model Mj will be 
denoted as ((3o,/3j)' = (/3o,/?i, • ■ • ,/3j)' and the vari¬ 
ance error as a 2 , where /3o and a are common to all 
models. 

For a sample (y, X), most references to ( 7 -priors in 
the variable selection literature (Berger and Pericchi 
(2001); Clyde and George (2000); George and Fos¬ 
ter (2000); Fernandez, Ley and Steel (2001); Hansen 
and Yu (2001); Liang et al. (2008), among others) re¬ 
fer to them as the pair (tt n (/3o, a), TT 9 (f3j\a)), where 

tt n ((3o,ct) = —IrxR+Wo ,cf) 
a 

is the reference prior, and 

n((3jW,g) = Nj (f3j | Oj , gu 2 (X'Xj ) -1 ), 
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g being an unknown positive hyperparameter, and 
~Kj the matrix of dimensions n x j resulting from 
suppressing the first column in the design matrix 
Xj_|_i of the original formulation of the regression 
model Mj. 

The conjugate property of these priors makes the 
expression of the Bayes factor quite simple, and it 
is well known that the hyperparameter g plays an 
important role in the behavior of the Bayes fac¬ 
tor. Several values for g have been suggested, al¬ 
though none of them satisfies all the reasonable re¬ 
quirements (Berger and Pericchi (2001); Clyde and 
George (2004); Clyde, Parmigiani and Vidakovic 
(1998); Fernandez, Ley and Steel (2001); George 
and Foster (2000); Hansen and Yu (2001); Liang 
et al. (2008)). For instance, large g values induce 
the Lindley-Bartlett paradox (Bartlett, 1957), and 
a fixed value for g induces inconsistency, which can 
be corrected if g were dependent on n. 

We consider two versions of the g-prior. The first 
is the one obtained for g = n, which is justified on 
the ground that it provides a consistent Bayes fac¬ 
tor, and it is a “unit information prior” (Kass and 
Wasserman, 1995). The second (/-prior version was 
derived for avoiding an incoherent property of the 
< 7 -prior detected by Berger and Pericchi (2001): the 
Bayes factor for comparing Mj versus Mo for the 
(/-prior does not tend to infinity as the coefficient 
of determination of Mj tends to one. A way to 
avoid this behavior is to integrate the conditional 
(/-priors {ir(/3j\a,g),g > 0} to obtain the mixture of 
(/-priors 


r oo 

■K Mlx ((3j\<j)= ir((3j\cr,g)ir(g) dg, 
Jo 

where 


n(9) 


(n/2) 1 / 2 


—3/2 


exp 



This mixture has been considered by some other 
authors, including Clyde and George (2004), Liang 
et al. (2008) and Scott and Berger (2010). 


3.3 Priors for Models 


Since 911 is a discrete space, a natural default prior 
over it is the uniform prior, but, as we will see, it 
is not a good prior when k = 0(n b ), 1/2 < b < 1. A 
generalization of the uniform prior is the paramet¬ 
ric independent Bernoulli prior class (George and 
McCulloch (1993)), for which the probability of a 


generic model Mj containing j out of k regressors, 
j <k, is given by 

7T(Mj\9) = 9 j (l-9) k ~ j , O<0<1, 

where 9 is an unknown hyperparameter, the mean¬ 
ing of which is the probability of inclusion of a re¬ 
gressor in the model. The prior ir(Mj\8) assigns the 
same probability to models with the same dimen¬ 
sion, and, in particular, for 9 = 1/2 the uniform prior 
is obtained. 

If we assume a uniform distribution for 9 , the un¬ 
conditional probability of model Mj is given by 

'K W {Mj ) = j\i{l-9) k -id9= Q) 1 kTl' 

If we decompose this probability as 

7t hu (Mj) = 7T HU (Mj | dJlj ) 7t hu (SDTj ), 

it follows that the model prior distribution, condi¬ 
tional on the class 911/, is uniform, and the marginal 
over the classes {917, , j = 0,1,..., k} is also uniform. 
Then, it seems appropriate to call to this prior the 
hierarchical uniform prior. 

We will see that the variable selection proce¬ 
dure that uses the hierarchical prior 7r HU (Mj) out¬ 
performs the behavior of the one using the prior 
n(Mj\6), for any value of 9. 

3.4 Bayes Factors 


For the data (y, X), it can easily be seen that the 
Bayes factor for comparing Mj versus Mo for the 
(/-prior with g = n is given by 


( 2 ) 


B 9=n 

o 


(y,x) = 


(1 + ji) C 71- ! -1 )/ 2 


(1 + nH/o) (n ~ 1)/2 ’ 
for the mixture of (/-priors by 

b“"( y,x) 

(3) =<"/Y 2 


r(i/2) 


f 

Jo 


(1 + (/)(”“■?- I )/ 2 

g ' exp 


/ o (l + gBjo^-W 
and for the intrinsic priors by 
B) S(y,X) 


n 

2/? 


dg, 


(4) =-(j + 2yV 2 

7r 

T / 2 sin- 7 (p(n + ( j + 2) sin 2 y>)( n- l- 1 )/ 2 
o {nBj 0 + ( j + 2) sin 2 ^)( n “ 1 )/ 2 


■L 


dip. 
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The integrals on (0, oo) in (3) and on (0,7r/2) in (4) 
do not have explicit expressions but need numerical 
integration. 

We note that all these Bayes factors depend on the 
data through the statistic Bjo, which is the ratio of 
the square sum of the residuals of models Mj and 
Mq, that is, 


(5) 


B,o = 


y'(I — Hj)y 

y'(l- (l/n)l n V n )y' 


where Hy is the hat matrix associated to Xj. 

We observe that each Bayes factor exhibits a dif¬ 
ferent dimension correction, and this suggests that 
for small or moderate samples sizes their behavior 
might be different, a point that we later explore in 
Section 5. 

For large sample sizes n useful analytic approxi¬ 
mations to the above Bayes factors are given in the 
next lemma. 

Lemma 1. For large sample sizes n, k = 0(n b ) 
and 0 < b < 1, the following approximations hold for 
any j <k: 


(hi) 


(8) B 


ip , 
j o ' 



-i/2 

B 

J+‘2j 

J 

. 

fj + 2 

• exp< 

2 


— (n—1)/2 

'j 0 


1 - 


B 


jo 


for b < 1, 


1 + - 


n 


3 + 2 


{n-j- 1)/2 


1 , nB J0 

3 + 2 
for b= 1. 


-(n-l)/2 


Proof. See Appendix A. □ 


The next theorem summarizes the fact that the 
three Bayes factors have an equivalent expression 
for large samples sizes n and a bounded potential 
number of regressors k. Further, this expression is 
the one obtained by Schwarz (1978). 

Theorem 2. When k is bounded, then, for large 
sample sizes n, the Bayes factors in (2), (3) and (4) 
are equivalent to the Schwarz approximation, that is, 


B 9=n 

AO 


; B$ lx ~ B% « exp ( - J - log n - ^ log B j0 


(i) 


( 6 ) B 9 


=n 

j 0 


n -j/2 B jn/2 

for b < 1, 


exp 


1 - 


n j/2 S j0 n/2 exp<j -( 1 


for b= 1, 


1 

W 

1 

Wo 


11 


(7) B 


Mix , 
•j0 ' 


n 


j/2 £ (n i 2)/2 r(( J + 1)/ 2 ) 


j o 


F(l/2) 


for b < 1, 

\ -i/2 

»-(n-j-2)/2 

2 ) 1° 


1 + -Bj 0 


n 


- 0 + 1)/2 


r((j + i)/2) 

F(l/2) : 

for 6 = 1, 


Proof. The proof follows from Lemma 1 and 
some algebraic manipulations. □ 

Theorem 2 implies that for low-dimensional reg¬ 
ular models, any Bayes factor is consistent, as 
the Schwarz approximation guarantees Bayes factor 
consistency. In this setting, for any positive model 
prior, posterior model consistency under an arbi¬ 
trary model Mf also holds. 

However, for high-dimensional models the Schwarz 
approximation does not necessarily guarantee either 
the Bayes factor consistency or the posterior model 
consistency. Other approximating forms than that 
of Schwarz appear in this latter setting. 

3.5 Asymptotic Approximations to the 
Bayes Factors 

The Bayes factor approximations in (6), (7) and 
(8) depend on the random sequence {Bjo,n > 1} 
given in (5). In this section we go a step forward 
and use the asymptotic distribution of the statistic 
Bjo under an arbitrary but fixed model M t to ap¬ 
proximate the Bayes factors B 9 ^ n , Bj^ lx and B l - q . 
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We first note that the asymptotic distribution of 
Bjo under M t , a doubly noncentral beta distribu¬ 
tion, depends on the limit of the pseudo-distance 
between models defined as 

f n,, 1 a' X*(In — Hj)X( a 

2 fit fit- 

Zj(J £ lb 

General properties of this pseudo-distance have been 
studied in Giron et al. (2010). This pseudo-distance 
S n (Mt,Mj) can be simplified as follows. We first 
write the covariance matrix of the joint set of co¬ 
variates of the model Mf and Mj , the dimensions of 
which are (t + j) x (t + j), as 

/ q(») c(n) \ 

y( n ) / *+* | 

U’> sff)’ 

where the matrices sjj 1 ' 1 , are definite posi¬ 

tive. Let us consider the matrices s[ U j = sffl — 

S tf S jf S jt \ and S t j = limn^oo Sff. Then, it 
can now be seen that 5 n (Mt,Mj) can be expressed 
as 

8 n (M t ,M j ) = -± s (? t S<tft3 t . 

In what follows we denote 6*(Mt,Mj ) = 
lim n ^. 00 5 n (Mt,Mj), and if there is no confusion, 
S*(M t , Mj) and S n (Mf,Mj) will be simply written 
as Sfj and Stj. 

For any Mj, using the asymptotic distribution of 
Bjo under M t , we can now provide asymptotic ap¬ 
proximations in probability \Mt] to the Bayes fac¬ 
tors B 9 j~ n , Bj^ lx and B jg that we summarize in 
Lemma 2. 

Lemma 2. When sampling from a model Mt, the 
Bayes factors in (2), (3) and (4) for j <k = 0(n b ), 
b< 1, can be approximated for large n as 

1 + S tj \ ~ n/2 

1 + S to ) 

\2(1 + 8*j))' 

< 1 , 

l + S* tj -j/n\- n/2 
1 + % ) 
f 5 tj ~ 6 to ~j/n \ 

\ 2(1 + Stj ~ j/n)) ’ 

= 1 , 


(9) B ] 0 - n «^ 


n 


n 


-1/2 

• exp 
for b 
- 1/2 


• exp 
for b 


( 10 ) B% 


Mix , 


and 


(11) B lp 


ne r i/2 /l + ^\ _{n_J " 2)/2 


1 + 5; 


to 


j + 1 

for b < 1, 
ne ^ 

j + 1 

1 + 5*. - j/n\- ( "-J- 2)/2 
1 + 5* 0 

for 6 = 1, 


n 


-1/2 / l + A*. \ -(n-l)/2 


1 + 5; 


to 


3 + 2 
for b < 1 

(n-l-l)/2 

1 + - 
3 

( n /i)(l + 5* 7 ) + 5* 0 \ _(n_1)/2 


1 + 5 


L 

;* 

to 


, for 6=1. 


Proof. The proof follows from Lemma 1 and 
the asymptotic distribution of the statistic Bjo un¬ 
der model Mt (Casella et al., 2009), and it is omit¬ 
ted. □ 


From Lemma 2 it follows that when sampling from 
the null model Mo, that is, when Mt = Mq, the 
asymptotic approximations (9), (10) and (11) no¬ 
tably simplify, as they only depend on n and the 
dimension j of the model, irrespective of the partic¬ 
ular set of covariates. This means that, under the 
null model Mo, the above Bayes factors are asymp¬ 
totically constant across models in the class 9Jlj. 

To prove some results on posterior model consis¬ 
tency when sampling from an alternative model Mt , 
we need to know for which models Mj in 911 the 
pseudo-distance S*(Mt,Mj ) is zero. This result fol¬ 
lows from Lemma 3. 


Lemma 3. (i) For any model Mj such that 

dim (Mj) < dim (Mf), we have that 

S*(M t ,Mj) > 0. 

(ii) For any model Mj such that dim (Mj) = 
dim (M t ), 


5*(M t ,Mj) 


0, if Mj = M t , 

>0, ifMj + Mf 


(iii) For any model Mj such that dim (Mj) > 
dim (M t ), 


S*(M t , Mj) 


0, if M t is nested in Mj, 
>0, otherwise. 
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Proof. We note that (a) if M t and Mj do not 
have common covariates, then the matrix = 

lim^^oo £^ is positive definite, and hence St-j is 
positive definite, and (b) if M t and Mj do have com¬ 
mon covariates, then it can be seen that 


where P is a positive definite matrix of dimensions 
max{0, dim Mt — dim Mj}. We observe that if either 
dim Mt = dim Mj or Mt is nested in Mj , we have 
that St- j = O. The proof of Lemma 3 follows from 
(a) and (b) and the fact that all regression coeffi¬ 
cients f3 t in model Alt are different from zero. □ 

It is interesting to remark that for b < 1 and any 
Mj such that 5*(Mt, Mj) > 0, the rate of conver¬ 
gence in probability [M t \ to zero of B(j, B^ lx and 

L>jo for M t / Mo is exponentially fast, but the rate 
of convergence in probability [Mo] to zero for j ^ 0 
is only potentially fast. This is in line with the re¬ 
sult for b = 0 obtained by Dawid (2011) (for discrete 
data see Consonni, Forster and La Rocca (2015)). 

4. POSTERIOR MODEL CONSISTENCY FOR 

K = 0(N b ) AND 0 < B < 1 

Posterior model consistency results for the six 
Bayesian variable selection procedures defined by 
the Bayes factors B 9 -q U , B^ 1x , B 1 ^, the Bernoulli 
model prior ir(Mj\0) and the hierarchical uniform 
prior 7 t hu (M j ), when sampling from an arbitrary 
but fixed model Mt are summarized in Theorem 3. 
For simplicity, the posterior model consistency re¬ 
sults for the case when sampling from model Mo 
and from an alternative model Mt are not separated. 
However, we keep in mind that the rate of conver¬ 
gence of the posterior model probabilities when sam¬ 
pling from Mo is different from the rate of conver¬ 
gence when sampling from Mt Mq . 

Theorem 3. (i) When sampling from Mt and 

k = 0(n b ), the Bayesian procedures for the Bayes 
factors B g /f n , B^f x , B 1 ^, and the Bernoulli model 
prior are posterior model consistent for 0 < b < 1/2 
and posterior model inconsistent for 1/2 < b < 1. 

(ii) When sampling from Mt and k = 0(n b ), the 
Bayesian procedures for the Bayes factors Bjq 71 , 
B^ lx and Bj q and the hierarchical uniform prior 
are posterior model consistent for 0 < b < 1. 


Proof. See Appendix B. □ 

It is interesting to observe that the Bernoulli prior 
n(Mj\8), conditional on 6 , induces a Binomial dis¬ 
tribution on the classes which, in turn, by the 
change of variables x = j/k, induces a distribution 
on x € [0,1] which converges in probability to a 
Dirac’s delta on 9, as k tends to infinity. In other 
words, for large values of k the Bernoulli prior con¬ 
centrates around models which have a proportion of 
covariates close to 9. Therefore, this apparently in¬ 
nocuous prior conveys too much prior information 
about the proportion of covariates of the models, 
and thus it makes the posterior model probabilities 
for 1/2 < b < 1 inconsistent. This wrong asymptotic 
behavior is corrected by the hierarchical uniform 
prior. 

5. SMALL SAMPLE COMPARISONS 

Given a Bayes factor Bjo for the models (Mq, Mj}, 
the decision of choosing model Mj when Pr(Mj| 
Bjo) > 1/2 is an optimal decision under Pr(Mo) = 
Pr (Mj) = 1/2 and a 0-1 loss function. We recall 
that for a uniform prior on the class of models 
to rank the models in the class according to their 
posterior model probabilities is equivalent to the 
ranking produced by the Bayes factor. In spite of 
this, a sampling analysis of the optimal statistical 
decision function has been long claimed [see, e.g., 
Fraser (2011) and discussions therein]. From expres¬ 
sion (2), (3) and (4) it is apparent that the dimen¬ 
sion correction of the Bayes factors for the g-prior 
with g = n, for the mixture of g-priors and for the 
intrinsic priors are different from each other. This 
suggests that their sampling behavior for small and 
moderate sample sizes might be different. 

In this section we study the sampling properties 
of the posterior model probabilities for Pr(Mo) = 
Pr(Mj) = 1/2 and the Bayes factors L>J cl “ n , B^q 1x 
and B|q. We recall that the posterior probability 
Pr(Mj|y, X) for any of these Bayes factors depends 
on the data (y,X) through the same statistic Bjo, 
which takes values in the interval (0,1). Therefore, 
the critical regions for rejecting the null model Mo 
for these Bayesian procedures are 

R% =n) = {Bj o : Pr(M, \B g = n ) > 1/2}, 

BT = {Bjo ■ Pr(Mj\B™ ix ) > 1/2} 

and 

Rf 0 = {B j0 :Pr(Mj\B] p 0 )>l/2}. 
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Fig. 1. Type I error probabilities for the intrinsic procedure 
(continuous line), the g-prior with g = n (dot-dashed) and the 
mixture of g-priors (dashed). 

These critical regions are in (0,1) and, since the pos¬ 
terior probabilities are monotone increasing func¬ 
tions of Bjo, they are intervals. Using the distri¬ 
bution of the statistic Bjo under Mo and Mj , we 
can compute the exact value of the Type I and II 
errors probabilities as a function of the model di¬ 
mension j and the sample size n. Figure 1 shows 
the Type I error probabilities of the optimal deci¬ 
sion rule associated to the regions Rj Q _n ^, Rj^ x and 
for j = n/3 and the sample size n = 1,..., 100. 

From Figure 1 it follows that the Type I error 
probabilities of the procedures based on g-priors are 
very close to each other and smaller than that based 
on the intrinsic priors. We note that as n and j in¬ 
crease at the same pace, n/j = 3, the Type I error 
probabilities for the procedure based on g-priors go 
faster to zero than the procedure based on the in¬ 
trinsic priors does. 

In Figure 2 we display for 5 j0 - 1 and j = n/3 the 
power function of the above procedures as a function 
of the sample size n = 1 ,..., 100. 

From Figure 2 we observe that the power of the 
procedure based on intrinsic priors is much larger 
than those based on the g-priors. This is the price 
the procedures based on g-priors pay for their very 
small Type I error probabilities. Further, the power 
for the intrinsic priors and the mixture of g-priors 
increases to one as the sample size n and the model 
dimension j increase at the same pace, that is, 
n/j = r> 1, but the power for the g-prior with g = n 
increases as n increases up to a certain n and then 


Power function for nlj = 3 and <5 = 1.5 

power 



0.0- 1 -*-*- 1 -*--*- 1 -*-*-■- 1 -*-*-*- L n 

0 20 40 60 80 100 


Fig. 2. Power for the Bayes factor for intrinsic priors (con¬ 
tinuous line), for g-priors for g = n (dot-dashed) and for the 
mixture of g-priors (dashed). 

decreases, which is a surprisingly unreasonable be¬ 
havior. The explanation to the anomalous behavior 
of the Bayes factor for the g = n is due to the incon¬ 
sistency of this Bayes factor for any model Mj such 
that j = 0(n), a point that we discuss in Section 6 
and summarize in Table 2. 

On the other hand, we remark that as 5j o in¬ 
creases, the power of the three procedures increases 
for any sample size n. 

Figures 1 and 2 indicate how unbalanced are the 
Type I and II error probabilities of the Bayesian pro¬ 
cedures based on g-priors compared with that based 
on the intrinsic priors. The practical implications of 
this analysis are that for moderate sample sizes the 
Bayesian procedures based on g-priors are strongly 
biased toward the null model. 

6. CONCLUDING REMARKS 

Variable selection in regression is a central prob¬ 
lem in statistical inference and the aim of this pa¬ 
per has been to evaluate the sampling properties of 
Bayesian model selection procedures, a requirement 
long advocated by many statisticians. For some in¬ 
teresting applications the number of regressors is 
very large, and hence we assumed that the poten¬ 
tial number of regressors k grows with n. We very 
soon realized that the variable selection takes place 
in a large class of models, and hence posterior model 
consistency seems to be the appropriate asymptotic 
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property to be explored, a concept that depends on 
the priors over the models and the model parame¬ 
ters. Posterior model consistency for variable selec¬ 
tion for three popular Bayes factors and two types 
of model priors has been explored, although the 
methodology we used can be applied to any other 
specific Bayes factor and model prior. 

For low-dimensional normal regression models it 
is well known that virtually any Bayes factor has an 
asymptotic approximation which is equivalent to the 
Schwarz approximation, which assures consistency. 
However, for large-dimensional models more appro¬ 
priate asymptotic approximations for the Bayes fac¬ 
tors, such as those given in Lemma 2, are necessary 
for analyzing consistency. 

Although we considered the independent Bernoulli 
class of model priors {tt(M\9), 9 € (0,1 ),M € 911} 
and the hierarchical uniform prior 7 r HU (M), a mix¬ 
ture of tt(M\6) with respect to the uniform distribu¬ 
tion on 6 , the asymptotic results for the hierarchical 
uniform prior can be formally extended to other 
regular mixtures of Bernoulli model priors. 

The conclusions on posterior model consistency 
we draw for the above Bayesian procedures when 
sampling from an arbitrary but fixed model Mt and 
for different rates of growth of k are summarized in 
Table 1. 

Table 1 implies that when sampling from Mt , the 
Bayesian procedures for the Bayes factors Bjq™, 
and Bj^ and the Bernoulli model prior are 
inconsistent for 1/2 < b < 1 , but for the hierarchi¬ 
cal uniform model prior they are consistent for any 
0 < b < 1. Thus, a first conclusion is that the hier¬ 
archical uniform model prior 7r HU (M) outperforms 
the independent Bernoulli model prior tt(M\9). 

We remark that the above results are valid when 
sampling from a fixed model Mt with finite dimen¬ 
sion. The analysis of the infinite dimensional case 
is an open problem that deserves more efforts, as 


the Bayes factors are now not necessarily consis¬ 
tent (Moreno, Giron and Casella, 2010) and, con¬ 
sequently, the posterior model consistency results 
differ from those presented above. For instance, for 
t = 0(n), the Bayes factor B^' L is such that 


lim B?r, n = 0 , 

n—too u 


[Mt], 


so that it is inconsistent under any model Mt 7 ^ Mq , 
and this implies that it is also posterior model in¬ 
consistent under Mt 7 ^ Mq for any model prior. 

For the Bayes factors H^g lx and B^ the situation 
is not so dramatic. The set of alternative models Mt 
for which inconsistency of H^g lx holds is a small set 
of models around Mo that satisfy the condition 

<5*o < <5mix(r-) = ^1 - (er) 1/(r - 1} - 1, 


where r = n/t > 1. Likewise, the set of alternative 
models Mt for which B\ q is inconsistent is that given 
by the condition 

5* 0 (r) < <5ip(r) = -- 7 —-rj - 1 . 

A summary of these results is given in Table 2. 

From Table 2 we can draw the conclusion that 
the intrinsic priors and the mixture of 5 -priors are 
preferred to the 5 -prior for 5 = n. 

We also note that <5ip(r) < <i m ix(^) : so that the 
inconsistency region of the Bayes factor for the in¬ 
trinsic priors is smaller than that for the mixture 
of 5 -priors. Further, for the case where r = 1 it can 
be shown that the Bayes factor L$} 1X is inconsistent 
for any alternative model Mt, while the Bayes fac¬ 
tor is inconsistent only for those Mt such that 
<5*o < 1 /log 2 — 1 . 

On the other hand, for small and moderate sample 
sizes, Figures 1 and 2 that we presented indicate that 
the behavior of the Bayes factors B ^ n and B^ lx are 
strongly biased toward the null model, while the 


Table 1 

Posterior model consistency when sampling from Mt, as 
a function of the Bayes factor, model prior and the rate of 
growth of k = 0(n b ) 


Model prior 


7T HU (M) 

Bayes factor 

jD9 =n jd Mix dIP 
n j 0 ’ n j 0 5 0 

TD9= n nMix R IP 
n j 0 J n j 0 j D j 0 

0<6< 1/2 

Consistent 

1/2 <b< 1 

Inconsistent 

Consistent 


Table 2 

Posterior model consistency when sampling from Mt, for 
t = n/r, r > 1, and n(Mo) > 0 


Bayes factor 

Model prior 

Posterior model consistency 

to 

3 

7t(Mo) > 0 

Inconsistent under any Mt 

nMix 

&to 

7t(Mo) > 0 

Inconsistent under Mt 
such that 5/o < <Liix(r) 

R IP 

-C >£0 

tt(Mq) > 0 

Inconsistent under Mt 
such that 5/o < fop (r) 
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Bayes factor for the intrinsic priors B l - q has more 
balanced Type I and II error probabilities. 

Therefore, the overall conclusion from our anal¬ 
ysis is that the intrinsic priors over the model pa¬ 
rameters and the hierarchical uniform prior over the 
models are nowadays the priors to be recommended 
for variable selection in normal regression. 


For any j and n, the integral in this expression has 
value 


y 


(l/Bjo)-l+(jln) l _ 


l \ ( 1 — j ')/2 


= b^ + v>/2 (i + i- 


log y, 
- 0 '+ l )/2 


3 0 


V 


1 + 


dy 

3 + 1 


APPENDIX A: PROOF OF LEMMA 1 


Part (i) is immediate and hence it is omitted. Part 
(ii) follows by first making the change of variables 
y = exp[— n/{2g)\ in the integral in (3). The Jaco¬ 
bian of the inverse transformation is 

j = ^9_ = n 
dy 2y log y 2 

and, thus, the integral in (3) now becomes 


1 - 


n 


2 log y 
n 

2 log y 


(n-j- 1)/2 


1 - 


nBjo \ 

2 log y J 


(-n+l)/2 


-3/2 


yj dy. 


The first factor in this integral can be approximated 
by 


l - 


n 


(n-j- 1)/2 


: V 


2 log y 

-l+(j/n) 


n 


2 log y 


(n-j- 1)/2 


and the second by 

nBj" \ (-"+b/ 2 


1 - 


*jo_Y 


2 log y J 

■ B$- n)/2 y 1/Bi0 (- 


n 


2 log y 


(1—n)/2 


Plugging these approximations in the integral, and 
after some simplifications, we obtain that the origi¬ 
nal Bayes factor can be approximated as 


and thus the approximation of the Bayes factor is 


£jo( y,x) 


-i /2 


B 


-(n-j- 2)/2 


3 0 

r((j + i)/2) 

P(l/2) ' 

If b < 1, we have that 


lim ( 1 + —Bjo 

nrt _\ J 


1 + -Bj o 

n 


-Ci+i)/2 


-(i+i)/2 


= 1 , 


and this proves the hrst part of (ii). If 6 = 1, the 
proof follows suit directly from the expression of the 
approximation. This completes the proof of part (ii). 

Part (iii) was proved in Giron et al. (2010) and 
hence it is omitted. 


APPENDIX B: PROOF OF THEOREM 3 


1. We hrst prove that condition (A) holds for 
the Bayes factor Bjq™, the Bernoulli model prior 
Tr(Mj\6) and 0 < b < 12, and that it does not hold 
for 1/2 < b < 1. For, under the Bernoulli prior we 
have that 

( A > = E £ + j 

j =0 Mj&tilj 

Mj^Mt 


■ exp 


1 

2 


f 5 ji _ 6 Jo\ 

V 1 + ^- J 


9 

1-9 


j-t 


From Lemma 3, the terms for j < t go to zero as n 
tends to infinity. For j > t let us split the class 9J lj 
as 


^ lx (y,x) 

^ (n/ 2)~ j/2 (_ n+ i)/ 2 
~ F(l/2) j0 


f 


y 


(i/s,- 0 )-i+oy «) 



(l-D/2 

dy. 


9Jlj = % U (Wlj -mj), 

where 91 j is the class of models Mj such that Mt is 
nested in Mj. From Lemma 3, it follows that S*^ = 0 
for any Mj € 91 j, and 5*j > 0 for Mj G Wlj — 91 j. 
Therefore, for large n the contribution of the models 
in 9H j — 91j to the sum in (A) tends to zero, and we 
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then have for large n that 
(t-j )/2 


(N» E E 


n 


j — t-\-\ Mj GO Xj 


exp 




1 + 5' 


k—t 

E 

i= 1 


k — t 
i 


n 


-i/2 


i-e 


= -i+ i+ 


e 


tj 


k—t 


1-9 


j-t 


exr 


(A)» E E 

j —1+1 A^j'GOXj 


n (*-j)/ 2 exp 




k—t 

E- 


= z^ n 

2—1 

OO 


—i/2 (* + *)! 


Z! 


<E ""‘ /2 

2—1 


-i/2 (* + *)! 


Z! 


(A) ~ (t + 2) 


~t/2 sr 


n 


\j + 2 
j=t+i v 


—j/2 


n 


i/2 


A: — f 


^-4 (A: — i)(/c — t — 1 ) • • • (k — t — i + 1 ) 

CL j 


2—1 


n 


i/2 


where 


&i — 


(t + z + 2)( t+i )/ 2 


(1 — 0)n 1 / 2 / 

<p {n b ~ 1/2 }. 

Then, for large n, (A) is equivalent to exp{rz b_1 / 2 } 
and this proves the assertions. 

2. We now prove that for the Bayes factor B g - q U 
and the hierarchical uniform prior 7r HU (M), condi¬ 
tion (A) does hold for any b < 1. Indeed, for large 
n, using again the decomposition Wlj = Tt j U (Wlj — 
Ttj), the sum (A) can be approximated for large n 
as 

* 


It can be seen that the sequence {a*} increases as 
z increases for % < io(t), where zo(t) « [1 + 1.65\/f], 
and decreases for z > io(t), and thus it is bounded 
by some function of t, say, a(t). Thus, the sum in 
(A) is upper bounded as 

(A) <(t + 2)~ t / 2 a(t) 


k—t 

E 

2—1 


(k — t)(k — t — 1) ■ ■ ■ (A; — t — i + 1) 


n 


i/2 


which, for b < 1 / 2 , converges to 0 as n tends to in¬ 
finity. A similar lower bound for (A) shows that for 
b > 1/2 the sum cannot converge to zero. 

For the Bayes factor L>^ lx the proof of the poste¬ 
rior model consistency is similar and hence omitted. 

4. We now prove that for B l -g and 7 r HU (M ? ) pos¬ 
terior consistency holds for b < 1. For large n we 
have that 

(A)«(t + 2)- 4 / 2 

k (~ J/ 2 n t/ 2 ( k ~ iKk-j) '- 

j=t +1 

which simplifies to 


,?;Ai + 2 f " \i-t) t\{k-t)\' 


=t ,(_ 1 + (1 _„-v r ._^E). 

The last expression tends to zero as n tends to in¬ 
finity, and this proves the assertion. 

3. Let us now consider the Bayes factor B 1 ^ and 
the Bernoulli prior. For simplicity we prove the as¬ 
sertion for 6 = 1/2, as the proof for any 9 follows 
the same line of reasoning. We first note that the 
contribution of the models in DJlj — to the sum 
in (A) tends to zero as n tends to infinity. Thus, we 
have for large n that 


k / r . \ “J / 2 „-| 

(A)«(t + 2 )-</ 2 E (7-0' "‘ /2 — 

j=t +1 ^ 


U ~t) 1 ' 


Making the change of variable i = j — t, the expres¬ 
sion adopts the form 


k—t 


(A)»(‘ + 2 )- ,/ 2 E^. 


2—1 


where 


h = (f+ z + 2) (t_N)/2 


(f+ z) 


Z! 


which, after the change of variables i = j — t, adopts 
the form 

(A)«(f + 2)- 4 / 2 


Every individual term ftj/n */ 2 in the sum converges 
to 0 as n tends to infinity, and for large values of z, 
the summands bi/n l t 2 can be approximated by 

e t/2+l j(i+3Z)/2 


n 


i/2 
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For every t, this function of i is decreasing for all 
i <io and increasing for i>io, where i o is given by 


3 1 

W(—3et/n) 


n/e — 3 1. 


But as k = 0(n b ) with 6 < 1, this implies that the 
sequence 6^/n ^/,2 is decreasing in i for al li < k. Then, 
it follows that the sum is upper bounded as 


k—t 

b i/ ni/2 < 

*=*o+l 


bi 

n 1 / 2 


+ (k — t) 


h 

n 


For k = 0(n b ) with b < 1, the limit of the right-hand 
side of this equation is 0 when n tends to infinity, 
and hence posterior model consistency holds. 

The proof of the consistency for the Bayes factor 
for the mixture of g-priors follows exactly the same 
pattern and it is therefore omitted. 

5. For 6 = 1, the proof of the posterior model con¬ 
sistency for Bj^ lx and 7r HU (Mj) runs as follows. For 
large n, it follows that, under the alternative model 
M u 

BT~ ((ne)/(* + l ))-'/ 2 

(1 + - j/n)- (, *-J-' 2)/2 

(1 — t/n)~b n ~ t ~‘ 2 )/ 2 

The ratio 7r HU (M ? )/7r HU (Mt) of the model proba¬ 
bilities for the hierarchical uniform prior is 

ir nv (Mj) = j\{k — j)\ 

7 r HU (IWi) t\(k — t)\ 

Then, reasoning as before, for large n, the double 
sum (A) of Theorem 1, after some simplifications, 
can be approximated as 

(A)«(f + 1)-*/ 2 


k 


y. 

j=t +1 




j! 

{j-ty 



-{n-j- 2)/2 


Making the change of variable i = j — t, some fur¬ 
ther simplifications on the factorials yield the ap¬ 
proximating expression 


(A) 


e -3t/2 g —*/2 r (*+1/2) ^ + ^(3*+3t)/2 

(t + 1 )*/ 2 n 1 ! 2 



— (n—i—t—2)/2 


Letting x = i/k and s = n/k, the sum in the preced¬ 
ing expression can be approximated, up to a con¬ 
stant, by the integral fk(x\s, t) dx, where 


fk(x\s,t) = k(kx)- kx - {1 / 2) (ks)-^ kx)/2 

. e -{l/2)kxf kx + t j(3fcs)/2+(3t)/2+l/2 

s )+ i +2) 


1 - 


ks 


We now prove that lim^oo fk(x\s, t) dx = 0 for 
any t = 0, 1,2, ... and s > 1. 

For any k, t and s > 1, fk(x\s,t) > 0. For t = 0, 
we have that fk(x\s,0) is a decreasing function of x 
for all k and s > 1, and such that /fc(0|s,0) = k. 
Further, lim^oo fk(x\s,0) = 0 for all x € (0,1]. For 
t = 1 , 2 ,..., even though fk(x\s,t) is not a decreasing 
function of x , except for large values of x, we have 
that f k (x\s,t) = 0, and lim fc _ >oo / fe (a;|s,0) = 0 

for all x € (0,1]. 

Thus, for any t, the limit of fk(x\s,t) when k goes 
to infinity is given by 


lirn f k (x\s,t) 

k—> oo 


oo, if x = 0, 

0, if x € (0,1], 


and thus 

/ lim f k (x\s,t) dx = 0. 

Jo k ^-oo 

On the other hand, f k (x\s,t) is a decreasing func¬ 
tion of s and, therefore, f k (x\s,t) < f k (x\l,t). More¬ 
over, for every t = 0,1,2,... there exists an inte- 
grable function u(x\t), such that 


fk{x\s,t) < u(x\t ), 


for large values of k. For instance, the function 
u(x\t) = 10 f Ga(x|0.1,1), where Ga(x|0.1,1) denotes 
the Gamma density with parameters 0.1 and 1, is 
an upper bound of f k (x\s,t). 

Applying the dominated convergence theorem to 
the sequence {f k (x\s,t), k > 1}, we have that 

fk(x\s,t)dx= / lim fk(x\s,t) dx = 0, 

Jo 

and this completes the posterior model consistency 
proof for the Bayes factor based on the mixture of 
g-priors and the hierarchical uniform prior. 

A similar proof can be given for the Bayes factors 
for g = n and for the intrinsic priors. This completes 
the proof of Theorem 3. 


lim / 

k^ooj o 
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